AIDA-light: High-Throughput Named-Entity Disambiguation
نویسندگان
چکیده
To advance the Web of Linked Data, mapping ambiguous names in structured and unstructured contents onto knowledge bases would be a vital asset. State-of-the-art methods for Named Entity Disambiguation (NED) face major tradeoffs regarding efficiency/scalability vs. accuracy. Fast methods use relatively simple context features and avoid computationally expensive algorithms for joint inference. While doing very well on prominent entities in clear input texts, these methods achieve only moderate accuracy when fed with difficult inputs. On the other hand, methods that rely on rich context features and joint inference for mapping names onto entities pay the price of being much slower. This paper presents AIDA-light which achieves high accuracy on difficult inputs while also being fast and scalable. AIDA-light uses a novel kind of two-stage mapping algorithm. It first identifies a set of “easy” mentions with low ambiguity and links them to entities in a very efficient manner. This stage also determines the thematic domain of the input text as an important and novel kind of feature. The second stage harnesses the high-confidence linkage for the “easy” mentions to establish more reliable contexts for the disambiguation of the remaining mentions. Our experiments with four different datasets demonstrates that the accuracy of AIDA-light is competitive to the very best NED systems, while its run-time is comparable to or better than the performance of the fastest systems.
منابع مشابه
Extending AIDA framework by incorporating coreference resolution on detected mentions and pruning based on popularity of an entity
Named Entity Disambiguation (NED) is gaining popularity due to its applications in the field of information extraction. Entity linking or Named Entity Disambiguation is the task of discovering entities such as persons, locations, organizations, etc. and is challenging due to the high ambiguity of entity names in natural language text. In this paper, we propose a modification to the existing sta...
متن کاملAIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables
We present AIDA, a framework and online tool for entity detection and disambiguation. Given a natural-language text or a Web table, we map mentions of ambiguous names onto canonical entities like people or places, registered in a knowledge base like DBpedia, Freebase, or YAGO. AIDA is a robust framework centred around collective disambiguation exploiting the prominence of entities, similarity b...
متن کاملU-AIDA: a customizable system for named entity recognition, classification, and disambiguation
Recognizing and disambiguating entities such as people, organizations, events or places in natural language text are essential steps for many linguistic tasks such as information extraction and text categorization. A variety of named entity disambiguation methods have been proposed, but most of them focus on Wikipedia as a sole knowledge resource. This focus does not fit all application scenari...
متن کاملCombining Mention Context and Hyperlinks from Wikipedia for Named Entity Disambiguation
Named entity disambiguation is the task of linking entity mentions to their intended referent, as represented in a Knowledge Base, usually derived from Wikipedia. In this paper, we combine local mention context and global hyperlink structure from Wikipedia in a probabilistic framework. Our results show that the two models of context, namely, words in the context and hyperlink pathways to other ...
متن کاملJoint Named Entity Recognition and Disambiguation
Extracting named entities in text and linking extracted names to a given knowledge base are fundamental tasks in applications for text understanding. Existing systems typically run a named entity recognition (NER) model to extract entity names first, then run an entity linking model to link extracted names to a knowledge base. NER and linking models are usually trained separately, and the mutua...
متن کامل